A strategy to detect metabolic changes induced by exposure to chemicals from large sets of condition-specific metabolic models computed with enumeration techniques

Background The growing abundance of in vitro omics data, coupled with the necessity to reduce animal testing in the safety assessment of chemical compounds and even eliminate it in the evaluation of cosmetics, highlights the need for adequate computational methodologies. Data from omics technologies allow the exploration of a wide range of biological processes, therefore providing a better understanding of mechanisms of action (MoA) related to chemical exposure in biological systems. However, the analysis of these large datasets remains difficult due to the complexity of modulations spanning multiple biological processes. Results To address this, we propose a strategy to reduce information overload by computing, based on transcriptomics data, a comprehensive metabolic sub-network reflecting the metabolic impact of a chemical. The proposed strategy integrates transcriptomic data to a genome scale metabolic network through enumeration of condition-specific metabolic models hence translating transcriptomics data into reaction activity probabilities. Based on these results, a graph algorithm is applied to retrieve user readable sub-networks reflecting the possible metabolic MoA (mMoA) of chemicals. This strategy has been implemented as a three-step workflow. The first step consists in building cell condition-specific models reflecting the metabolic impact of each exposure condition while taking into account the diversity of possible optimal solutions with a partial enumeration algorithm. In a second step, we address the challenge of analyzing thousands of enumerated condition-specific networks by computing differentially activated reactions (DARs) between the two sets of enumerated possible condition-specific models. Finally, in the third step, DARs are grouped into clusters of functionally interconnected metabolic reactions, representing possible mMoA, using the distance-based clustering and subnetwork extraction method. The first part of the workflow was exemplified on eight molecules selected for their known human hepatotoxic outcomes associated with specific MoAs well described in the literature and for which we retrieved primary human hepatocytes transcriptomic data in Open TG-GATEs. Then, we further applied this strategy to more precisely model and visualize associated mMoA for two of these eight molecules (amiodarone and valproic acid). The approach proved to go beyond gene-based analysis by identifying mMoA when few genes are significantly differentially expressed (2 differentially expressed genes (DEGs) for amiodarone), bringing additional information from the network topology, or when very large number of genes were differentially expressed (5709 DEGs for valproic acid). In both cases, the results of our strategy well fitted evidence from the literature regarding known MoA. Beyond these confirmations, the workflow highlighted potential other unexplored mMoA. Conclusion The proposed strategy allows toxicology experts to decipher which part of cellular metabolism is expected to be affected by the exposition to a given chemical. The approach originality resides in the combination of different metabolic modelling approaches (constraint based and graph modelling). The application to two model molecules shows the strong potential of the approach for interpretation and visual mining of complex omics in vitro data. The presented strategy is freely available as a python module (https://pypi.org/project/manamodeller/) and jupyter notebooks (https://github.com/LouisonF/MANA). Supplementary Information The online version contains supplementary material available at 10.1186/s12859-024-05845-z.


Correction coefficient:
To maintain the biomass reaction balanced after setting the stoichiometry coefficient for the biomass_DNA reactant to zero, we applied the following correction coefficient to the stochiometric coefficient for all other reactants of the biomass reaction: Where   refers to the stoichiometric coefficient for the biomass_DNA reactant in the biomass reaction initially defined in Recon2.2[1].We applied this correction coefficient to all other metabolites (i.e biomass_lipid, biomass_carbohydrate, biomass_other, biomass_RNA and biomass_protein) of the Recon2.2biomass reaction so that the sum of the coefficients remains equal to one.

DAR specificity ratio:
To evaluate the degree of specificity of a Differentially Activated Reaction (DAR) signature, we compute the DAR specificity ratio.A DAR specificity ratio of 100% means that all the predicted DARs are retrieved only for the studied molecule, meaning that the metabolic Mechanism of Action (mMoA) of this molecule is unique compared to mMoA of other studied molecules.

𝑫𝑨𝑹𝒔𝒑𝒆𝒄 𝒙 = 𝒏𝑺𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝑫𝑨𝑹𝒔 𝒙 𝒏𝑻𝒐𝒕𝒂𝒍𝑫𝑨𝑹𝒔 𝒙
With   , the number of DARs retrieved only in the list of DARs predicted for the molecule X and   , the total number of DARs predicted for the molecule X.

DEXOM parameters:
The choice of parameters values is based on a compromise between the quality of returned solutions and the solving time.Indeed, we can compare the optimality score achieved by the algorithm to the theoretical optimality score of the constructed mixed integer linear problem which is the maximum adequacy score between the network topology of the GSMN and the list of active/inactive reactions obtained from the binarization of transcriptomic data.We selected the following parameters: • eps = 1e -2 • thr = 1e -5 • tlim = 600s • mipgaptol = 1e -3 eps and thr are the thresholds above which a reaction is considered active.eps represent, the activation threshold of activated reactions (according to transcriptomic data) and thr represents, the activation threshold for unweighted reactions (according to transcriptomic data).tlim is the parameter defining the maximum time allowed to solve a mixed integer linear problem.The mipgaptol parameter defines the maximal allowed gap between the theoretical maximal adequacy score and the adequacy score of the optimized solution allowing us to consider the solution as optimal.The trade-off between quality and running time described above is mainly affected by tlim and mipgaptol.Therefore, these two parameters might need to be carefully adjusted according to the complexity of the problem and the available computational resources.

Adapted version of Systematic sampling:
Systematic sampling aims at picking random samples in a uniform fashion over all the dataset [2].Here, the dataset consists of solutions enumerated with the Reaction-Enum partial enumeration method.We implemented this systematic sampling method because in Recon2.2,reactions are sorted in alphabetical order by default and some reactions names are informative of their function (e.g.EX_R… refers to Exchange reactions).Since the aim of Reaction-Enum is to iteratively block each reaction and if possible, find an optimal solution with this reaction blocked, the order of solutions returned by Reaction-Enum is impacted by this reaction naming bias.Therefore, to avoid picking an imbalanced set of starting solutions (i.e. a set of solutions not representative of the diversity of possible solutions computed by Reaction-Enum), we implemented a systematic sampling approach.
First, we compute the number of batches according to the size of the range defined at the beginning of the batch computation of the Reaction-Enum step.Then for each batch we pick a random solution in the corresponding range of Reaction-Enum solutions.This solution will then be used as starting point of a Diversity-Enum batch.As an example, in the case where we have 4500 solutions obtained by Reaction-Enum and a range of 100 solutions per batch, the starting solution for the first Diversity-Enum batch will be picked between the first and 99 th Reaction-Enum solutions, the starting solution for the second Diversity-Enum batch will be picked between the 100 th and the 199 th Reaction-Enum solutions and so on until we picked a starting solution for each batch, the last one being between the 4400 th and the 4500 th Reaction-Enum solutions

Impact of DARs filtration on pathway enrichment results:
Pathway over-representation analysis (ORA) is size sensitive [3,4].Therefore, keeping reactions that are not functionally informative (i.e.sink, pool and exchange reactions) can lead to important differences in pathway ORA.For instance, when keeping such reactions, only one significantly enriched pathway (Transport, extracellular) was evidenced for valproic acid (S3 Fig), whereas 12 were identified when removing these reactions.These results suggest that DARs lists should be carefully processed in order to remove such reactions before performing an over-representation analysis.Interestingly, the graph-based analysis developed in this study is less impacted by keeping or removing these reactions since they are usually clustered together.

Robustness of the results to the random-selection of starting solutions
Due to the high computation cost of DEXOM, we adapted the sampling strategy by using only 1% of the solutions enumerated by the reaction-enum approach as starting solutions for the diversity-enum approach.To assess the robustness of the results after this adaptation we performed 5 runs of our approach, by randomly selecting a different set of 1% reactions in each run.This test was done on two conditions: amiodarone, control, 24hr and amiodarone, high dose, 24hr.
We first checked that the overlap between randomly selected sets across the 5 runs was low, so that we would be able to truly assess the impact of the stochasticity of the random selection process on the robustness and consistency of our results.Table 1 and Table 2 show that the overlap of randomly selected reaction-enum solutions is always lower than 5%, highlighting that starting solutions for each of the 5 runs are indeed very different.2. Overlap between the solutions from reaction-enum selected in each run for the amiodarone high dose condition at 24hr.
We next compared the overlap between DARs calculated from each run.The results (Table 3) show that the solutions enumerated by the diversity-enumeration approach are very similar across the 5 runs, with a lowest overlap of 97.2%, which corresponds to only 1 DAR being different between the 2 runs.These simulations show that our adaptation of DEXOM does not significantly impact the diversity of the enumerated solution, and we therefore consider that this adaptation is relevant to allow a drastic reduction in computing time while providing robust and consistent results.

Table 1 .
Overlap between the solutions from reaction-enum selected in each run for the amiodarone control condition at 24hr.

Table 3 .
Overlap between predicted DARs in each run for the amiodarone condition.